Variables

nombre descripcion
TimePermCG5 Tiempo permanencia
PermanentCG5 variable de status: 1 tiene la plaza
MobilityPG24 Movilidad
Q6 dicipline
Q23 Sex
Q24 Age
Q25 Single
Q27 Children (ver variable Parenthood tb )
Countryb Country
foreign Extranjero
foreignEdu2 Si te has educado en un país extranjero
multidisc Mobility disciplinar
publ_prod2 Publications

Nota sobre survreg: survreg() fits accelerated failure models, not proportional hazards models. The coefficients are logarithms of ratios of survival times, so a positive coefficient means longer survival.

Nota para mi: utilizar función survplot dentro de la librería rms, ver http://rstudio-pubs-static.s3.amazonaws.com/5564_bc9e2d9a458c4660aa82882df90b7a6b.html ## Lectura de datos desde fichero de stata

Con la librería foreign podemos leer datos de spss, stata, etc.

library(foreign)
datos <- read.dta("../data/working_10_Cleaner.dta")
head( names(datos))
## [1] "Career_stage"      "CodeNumber_enc"    "CodeNumber"       
## [4] "Countryb"          "Current_Institute" "FieldofScience"

Nos quedamos sólo con las variables que nos hacen falta

datos <- datos[,c("ability","Countryb","Q6","Q23","Q24","Q25","Q27",
                        "Parenthood", "foreign", "foreignEdu2",
                        "multidisc", "TimePermCG5","PermanentCG5",
                        "mobilityPG24")]

La variable Countryb tiene como niveles todos los países, elimino las etiquetas de países que no están en los datos y elijo UK como el país de referencia

datos$Countryb <- droplevels(as.factor(as.character(datos$Countryb)))
datos$Countryb <- relevel(datos$Countryb,ref="united kingdom")

Defino la variable time como TimePermCG y status como PermanentCG

datos$time <- datos$TimePermCG5

datos$status <- datos$PermanentCG5

Comprobamos si hay casos perdidos

sum( is.na( datos$time))
## [1] 36
sum( is.na( datos$status))
## [1] 0

Eliminamos los perdidos en time

datos <- datos[!is.na(datos$time),]

Tenemos observaciones censuradas, es decir, entrevistados que no han obtenido la plaza ni sabemos cuando la obtendrán. Unas 1813

table(datos$status)
## 
##    0    1 
## 1813 4228

Por otro lado también tenemos entrevistados que en time=0 ya tienen la plaza (status=1), una posible solución es sumar 0.5 a time. Hay 1243 casos así

 head(with(datos, table(time, status)),10)
##     status
## time   0    1
##    0  69 1243
##    1  87  462
##    2  96  394
##    3 127  354
##    4 162  308
##    5 182  258
##    6 177  250
##    7 180  182
##    8 132  157
##    9 100  107

sumamos 0.001 a time

datos$time <- datos$time +0.001

En la variable de ability tengo 43 valores inferiores a 0, preguntar a Ana sobre esta variable, como está construida y si tiene sentido que sea menor que 0. También hay 1627 casos perdidos en esta variable

table(datos$ability<0, exclude = NULL)
## 
## FALSE  TRUE  <NA> 
##  4371    43  1627

Eliminar niveles sobrantes en Q6 (disciplina)

table(datos$Q6)
## 
##   Same as current position      Agricultural Sciences 
##                          0                        111 
## Engineering and Technology                 Humanities 
##                       1317                        792 
##           Medical Sciences           Natural Sciences 
##                        710                       1446 
##            Social Sciences 
##                       1623
datos$Q6 <- droplevels(datos$Q6)

table(datos$Q6)
## 
##      Agricultural Sciences Engineering and Technology 
##                        111                       1317 
##                 Humanities           Medical Sciences 
##                        792                        710 
##           Natural Sciences            Social Sciences 
##                       1446                       1623

Curvas de supervivencia por Kaplan-Meier

Utilizamos la librería survival

library(survival)
library(rms)
## Loading required package: Hmisc
## Loading required package: grid
## Loading required package: lattice
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
## 
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## 
## The following object is masked from 'package:base':
## 
##     backsolve

El objeto fundamental en análisis de supervivencia es Surv(time,status)

S1 <- with(datos, Surv(time,status))

Un objeto de tipo Surv guarda atributos como que variables se han utilizado, el tipo de censura (right en este caso), etc.

attributes(S1)
## $dim
## [1] 6041    2
## 
## $dimnames
## $dimnames[[1]]
## NULL
## 
## $dimnames[[2]]
## [1] "time"   "status"
## 
## 
## $type
## [1] "right"
## 
## $class
## [1] "Surv"

Si vemos los primeros valores, la cruz puesta después del 12.5 indica que ese individuo está censurado, no tiene plaza en el momento en que se le entrevistó, ni sabemos cuando la obtendrá, pero se puede utilizar para el análisis, ya que sabemos que han pasado 12 ¿años? y todavía no la tiene

S1[1:6]
## [1]  1.001   5.001  14.001+ 12.001+  2.001   1.001

Si lo vemos en los datos, vemos que el caso 4 no ha obtenido la plaza (status=0)

datos[1:6, c("time","status")]
##     time status
## 1  1.001      1
## 2  5.001      1
## 3 14.001      0
## 4 12.001      0
## 5  2.001      1
## 6  1.001      1

Función de supervivencia de toda la población

Con survfit podemos obtener la curva de supervivencia utilizando el estimador de Kaplan-Meier. Nota (cambiamos por función npsurv de la librería rms que hace lo mismo, pero luego salen gráficos más claros)

survplot(npsurv(S1 ~ 1))

Seleccionando time entre 0 y 20

plot(npsurv(S1 ~ 1,), xlim=c(0,20))

Función de supervivencia según movilidad

Con survfit también podemos ver si hay diferencias por movilidad.

En primer lugar vemos las medias y medianas de time por movilidad

# todos datos, incluyendo los censurados
with(datos, tapply(time,mobilityPG24, function(x)c(media=mean(x),mediana=median(x))))
## $`0`
##    media  mediana 
## 4.525196 3.001000 
## 
## $`1`
##    media  mediana 
## 5.687516 5.001000
# solo lso que tienen status=1

with(datos[datos$status==1,], tapply(time,mobilityPG24, function(x)c(media=mean(x),mediana=median(x))))
## $`0`
##    media  mediana 
## 3.768638 2.001000 
## 
## $`1`
##   media mediana 
## 4.48063 3.00100

Si se observan diferencias,con mobility=0 hay menos survival, es decir, la gente que no se mueve obtiene la plaza antes. (obtener plaza es el evento modelado, seguir survival significa seguir sin plaza)

Dibujamos las dos curvas de supervivencia

mod1 <- npsurv(S1 ~ mobilityPG24 , datos)
survplot(mod1, ylab=expression(hat(S)(t)))

Para cada time y en cada categoría de mobilityPG24, se ha calculada el valor de survival y el intervalo de confianza. No se aprecian diferencias entre movilidad 0 y movilidad 1

summary(mod1)
## Call: npsurv(formula = S1 ~ mobilityPG24, data = datos)
## 
## 115 observations deleted due to missingness 
##                 mobilityPG24=0 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001   3575     902   0.7477 0.00726      0.73359       0.7621
##   1.001   2616     311   0.6588 0.00796      0.64339       0.6746
##   2.001   2240     216   0.5953 0.00828      0.57926       0.6117
##   3.001   1959     172   0.5430 0.00846      0.52668       0.5598
##   4.001   1695     164   0.4905 0.00858      0.47394       0.5076
##   5.001   1424     150   0.4388 0.00865      0.42218       0.4561
##   6.001   1175     128   0.3910 0.00868      0.37436       0.4084
##   7.001    956     105   0.3481 0.00868      0.33146       0.3655
##   8.001    757      86   0.3085 0.00868      0.29197       0.3260
##   9.001    610      66   0.2751 0.00866      0.25868       0.2926
##  10.001    505      68   0.2381 0.00858      0.22186       0.2555
##  11.001    395      45   0.2110 0.00850      0.19495       0.2283
##  12.001    312      41   0.1832 0.00841      0.16747       0.2005
##  13.001    251      31   0.1606 0.00830      0.14514       0.1777
##  14.001    204      25   0.1409 0.00816      0.12580       0.1579
##  15.001    166      18   0.1256 0.00803      0.11085       0.1424
##  16.001    135      12   0.1145 0.00794      0.09993       0.1311
##  17.001    115       9   0.1055 0.00786      0.09119       0.1221
##  18.001     95       8   0.0966 0.00780      0.08249       0.1132
##  19.001     80       2   0.0942 0.00779      0.08012       0.1108
##  20.001     77      11   0.0808 0.00766      0.06705       0.0973
##  21.001     61       9   0.0688 0.00749      0.05562       0.0852
##  22.001     52       1   0.0675 0.00746      0.05437       0.0838
##  23.001     50       7   0.0581 0.00722      0.04550       0.0741
##  24.001     42       9   0.0456 0.00676      0.03412       0.0610
##  25.001     30       1   0.0441 0.00671      0.03274       0.0594
##  27.001     26       4   0.0373 0.00648      0.02656       0.0524
##  28.001     20       2   0.0336 0.00634      0.02320       0.0486
##  29.001     16       2   0.0294 0.00621      0.01943       0.0445
##  31.001     12       1   0.0269 0.00615      0.01722       0.0421
##  34.001      7       1   0.0231 0.00636      0.01345       0.0396
##  39.001      3       1   0.0154 0.00758      0.00586       0.0404
## 
##                 mobilityPG24=1 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001   2351     341   0.8550 0.00726       0.8408       0.8693
##   1.001   1999     151   0.7904 0.00840       0.7741       0.8070
##   2.001   1829     178   0.7135 0.00936       0.6953       0.7320
##   3.001   1621     182   0.6333 0.01001       0.6140       0.6533
##   4.001   1407     144   0.5685 0.01034       0.5486       0.5892
##   5.001   1211     108   0.5178 0.01051       0.4976       0.5388
##   6.001   1025     122   0.4562 0.01064       0.4358       0.4775
##   7.001    826      77   0.4137 0.01069       0.3932       0.4352
##   8.001    669      71   0.3698 0.01075       0.3493       0.3915
##   9.001    533      41   0.3413 0.01080       0.3208       0.3632
##  10.001    437      41   0.3093 0.01089       0.2887       0.3314
##  11.001    340      28   0.2838 0.01100       0.2631       0.3062
##  12.001    262      33   0.2481 0.01124       0.2270       0.2711
##  13.001    210      24   0.2197 0.01135       0.1986       0.2431
##  14.001    162      15   0.1994 0.01145       0.1782       0.2231
##  15.001    134      13   0.1800 0.01153       0.1588       0.2041
##  16.001    113       5   0.1721 0.01155       0.1509       0.1963
##  17.001     97      12   0.1508 0.01164       0.1296       0.1754
##  18.001     81       5   0.1415 0.01165       0.1204       0.1662
##  19.001     71       6   0.1295 0.01164       0.1086       0.1545
##  20.001     59       5   0.1185 0.01164       0.0978       0.1437
##  21.001     45       5   0.1054 0.01174       0.0847       0.1311
##  22.001     35       3   0.0963 0.01184       0.0757       0.1226
##  23.001     30       3   0.0867 0.01189       0.0663       0.1134
##  25.001     23       2   0.0792 0.01199       0.0588       0.1065
##  26.001     19       1   0.0750 0.01206       0.0547       0.1028
##  31.001      9       1   0.0667 0.01329       0.0451       0.0985
##  34.001      7       1   0.0571 0.01441       0.0349       0.0937
##  37.001      3       1   0.0381 0.01828       0.0149       0.0976
##  43.001      1       1   0.0000     NaN           NA           NA

Para ver el tiempo mediano hasta obtener la plaza se puede ver

mod1
## Call: npsurv(formula = S1 ~ mobilityPG24, data = datos)
## 
##    115 observations deleted due to missingness 
##                   n events median 0.95LCL 0.95UCL
## mobilityPG24=0 3575   2608      4       4       5
## mobilityPG24=1 2351   1620      6       5       6

Conclusión: En principio si hay diferncias según movilidad.

Función de supervivencia según disciplina

# todos datos, incluyendo los censurados
with(datos, tapply(time,Q6, function(x)c(media=mean(x),mediana=median(x))))
## $`Agricultural Sciences`
##    media  mediana 
## 6.406405 5.001000 
## 
## $`Engineering and Technology`
##    media  mediana 
## 4.831676 3.001000 
## 
## $Humanities
##    media  mediana 
## 5.218172 4.001000 
## 
## $`Medical Sciences`
##    media  mediana 
## 6.072831 5.001000 
## 
## $`Natural Sciences`
##    media  mediana 
## 5.927694 5.001000 
## 
## $`Social Sciences`
##    media  mediana 
## 4.287506 3.001000
# solo lso que tienen status=1

with(datos[datos$status==1,], tapply(time,Q6, function(x)c(media=mean(x),mediana=median(x))))
## $`Agricultural Sciences`
##    media  mediana 
## 5.410639 4.001000 
## 
## $`Engineering and Technology`
##   media mediana 
## 3.86575 2.00100 
## 
## $Humanities
##    media  mediana 
## 3.795224 2.001000 
## 
## $`Medical Sciences`
##    media  mediana 
## 4.898059 4.001000 
## 
## $`Natural Sciences`
##    media  mediana 
## 4.871999 4.001000 
## 
## $`Social Sciences`
##    media  mediana 
## 3.054957 1.001000

Dibujamos las curvas de supervivencia

# elijo una paleta de colores
library(RColorBrewer)
colores <- brewer.pal(10,"Set3")
mod2 <- npsurv(S1 ~ Q6 , datos)
survplot(mod2, col = colores[1:6],
         ylab = expression(hat(S)(t)),
         time.inc = 5,
         conf = "none")

summary(mod2)
## Call: npsurv(formula = S1 ~ Q6, data = datos)
## 
## 42 observations deleted due to missingness 
##                 Q6=Agricultural Sciences 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001    111      20   0.8198  0.0365       0.7514        0.895
##   1.001     91       4   0.7838  0.0391       0.7108        0.864
##   2.001     85      10   0.6916  0.0440       0.6104        0.783
##   3.001     72       5   0.6435  0.0459       0.5596        0.740
##   4.001     66       4   0.6045  0.0471       0.5189        0.704
##   5.001     59       8   0.5226  0.0488       0.4351        0.628
##   6.001     50       4   0.4808  0.0492       0.3934        0.588
##   7.001     43       5   0.4249  0.0494       0.3383        0.534
##   8.001     34       3   0.3874  0.0496       0.3015        0.498
##   9.001     27       6   0.3013  0.0495       0.2184        0.416
##  10.001     21       3   0.2583  0.0482       0.1791        0.372
##  11.001     17       2   0.2279  0.0471       0.1520        0.342
##  12.001     14       1   0.2116  0.0465       0.1376        0.325
##  13.001     12       1   0.1940  0.0458       0.1221        0.308
##  15.001     11       1   0.1763  0.0449       0.1070        0.291
##  16.001     10       1   0.1587  0.0438       0.0924        0.272
##  17.001      9       1   0.1411  0.0423       0.0784        0.254
##  20.001      7       1   0.1209  0.0408       0.0624        0.234
##  22.001      6       1   0.1008  0.0386       0.0475        0.214
##  25.001      5       2   0.0605  0.0320       0.0214        0.171
## 
##                 Q6=Engineering and Technology 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001   1317     276   0.7904  0.0112       0.7688       0.8127
##   1.001   1017     125   0.6933  0.0128       0.6687       0.7188
##   2.001    874      91   0.6211  0.0135       0.5952       0.6481
##   3.001    765      84   0.5529  0.0139       0.5263       0.5809
##   4.001    657      68   0.4957  0.0141       0.4688       0.5241
##   5.001    556      56   0.4457  0.0142       0.4188       0.4744
##   6.001    464      45   0.4025  0.0142       0.3757       0.4313
##   7.001    369      41   0.3578  0.0142       0.3310       0.3868
##   8.001    286      28   0.3228  0.0143       0.2959       0.3520
##   9.001    244      18   0.2990  0.0143       0.2722       0.3283
##  10.001    204      19   0.2711  0.0143       0.2444       0.3007
##  11.001    162      16   0.2443  0.0144       0.2177       0.2742
##  12.001    123      16   0.2126  0.0145       0.1859       0.2431
##  13.001     97      17   0.1753  0.0145       0.1490       0.2062
##  14.001     73       2   0.1705  0.0145       0.1443       0.2015
##  15.001     68       5   0.1580  0.0145       0.1320       0.1891
##  16.001     58       1   0.1552  0.0145       0.1293       0.1864
##  17.001     52       5   0.1403  0.0146       0.1145       0.1720
##  18.001     43       5   0.1240  0.0146       0.0985       0.1561
##  19.001     36       2   0.1171  0.0146       0.0918       0.1494
##  20.001     32       4   0.1025  0.0145       0.0777       0.1351
##  21.001     26       3   0.0906  0.0143       0.0665       0.1235
##  23.001     22       4   0.0742  0.0139       0.0514       0.1070
##  24.001     17       2   0.0654  0.0136       0.0436       0.0982
##  25.001     15       1   0.0611  0.0133       0.0398       0.0937
##  27.001     13       2   0.0517  0.0128       0.0318       0.0841
##  31.001      7       1   0.0443  0.0129       0.0250       0.0786
##  34.001      3       1   0.0295  0.0148       0.0110       0.0790
##  37.001      1       1   0.0000     NaN           NA           NA
## 
##                 Q6=Humanities 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001    792     196   0.7525  0.0153       0.7231        0.783
##   1.001    585      66   0.6676  0.0168       0.6355        0.701
##   2.001    507      46   0.6071  0.0175       0.5737        0.642
##   3.001    456      40   0.5538  0.0179       0.5199        0.590
##   4.001    397      27   0.5161  0.0181       0.4819        0.553
##   5.001    345      29   0.4728  0.0182       0.4383        0.510
##   6.001    291      27   0.4289  0.0184       0.3943        0.467
##   7.001    243      25   0.3848  0.0185       0.3501        0.423
##   8.001    203      20   0.3469  0.0185       0.3124        0.385
##   9.001    166      13   0.3197  0.0185       0.2853        0.358
##  10.001    139      11   0.2944  0.0186       0.2601        0.333
##  11.001    123       6   0.2800  0.0186       0.2459        0.319
##  12.001    106      11   0.2510  0.0186       0.2170        0.290
##  13.001     84       9   0.2241  0.0186       0.1904        0.264
##  14.001     68       6   0.2043  0.0187       0.1708        0.244
##  15.001     57       4   0.1900  0.0187       0.1567        0.230
##  16.001     49       5   0.1706  0.0187       0.1376        0.211
##  17.001     40       4   0.1535  0.0187       0.1210        0.195
##  18.001     32       1   0.1487  0.0187       0.1163        0.190
##  19.001     29       1   0.1436  0.0187       0.1112        0.185
##  21.001     26       1   0.1381  0.0188       0.1057        0.180
##  22.001     23       1   0.1321  0.0189       0.0997        0.175
##  23.001     21       1   0.1258  0.0190       0.0935        0.169
##  27.001     13       1   0.1161  0.0199       0.0830        0.162
##  28.001     11       1   0.1056  0.0207       0.0719        0.155
##  39.001      2       1   0.0528  0.0387       0.0125        0.222
##  43.001      1       1   0.0000     NaN           NA           NA
## 
##                 Q6=Medical Sciences 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001    710     112   0.8423  0.0137       0.8159        0.869
##   1.001    588      36   0.7907  0.0153       0.7613        0.821
##   2.001    537      36   0.7377  0.0166       0.7058        0.771
##   3.001    484      41   0.6752  0.0179       0.6411        0.711
##   4.001    425      33   0.6228  0.0187       0.5872        0.660
##   5.001    372      30   0.5725  0.0193       0.5360        0.612
##   6.001    323      42   0.4981  0.0199       0.4606        0.539
##   7.001    269      29   0.4444  0.0201       0.4067        0.486
##   8.001    217      25   0.3932  0.0202       0.3555        0.435
##   9.001    179      17   0.3559  0.0202       0.3183        0.398
##  10.001    151      15   0.3205  0.0202       0.2833        0.363
##  11.001    122      13   0.2864  0.0201       0.2495        0.329
##  12.001     99       8   0.2632  0.0201       0.2266        0.306
##  13.001     89       9   0.2366  0.0199       0.2006        0.279
##  14.001     72       9   0.2070  0.0197       0.1718        0.250
##  15.001     57       7   0.1816  0.0195       0.1471        0.224
##  16.001     45       2   0.1735  0.0195       0.1393        0.216
##  17.001     40       2   0.1649  0.0194       0.1309        0.208
##  18.001     35       1   0.1601  0.0194       0.1262        0.203
##  19.001     29       2   0.1491  0.0196       0.1152        0.193
##  20.001     26       2   0.1376  0.0197       0.1040        0.182
##  21.001     19       2   0.1231  0.0201       0.0894        0.170
##  22.001     17       1   0.1159  0.0202       0.0824        0.163
##  23.001     13       1   0.1070  0.0205       0.0735        0.156
##  24.001     11       1   0.0973  0.0208       0.0639        0.148
## 
##                 Q6=Natural Sciences 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001   1446     175   0.8790 0.00858       0.8623       0.8959
##   1.001   1262      74   0.8274 0.00995       0.8082       0.8472
##   2.001   1175     124   0.7401 0.01158       0.7178       0.7632
##   3.001   1039     105   0.6653 0.01250       0.6413       0.6903
##   4.001    916     117   0.5803 0.01315       0.5551       0.6067
##   5.001    778      81   0.5199 0.01338       0.4943       0.5468
##   6.001    656      83   0.4541 0.01350       0.4284       0.4814
##   7.001    529      44   0.4164 0.01352       0.3907       0.4437
##   8.001    437      51   0.3678 0.01355       0.3422       0.3953
##   9.001    346      27   0.3391 0.01357       0.3135       0.3667
##  10.001    284      39   0.2925 0.01360       0.2670       0.3204
##  11.001    207      20   0.2642 0.01368       0.2388       0.2925
##  12.001    160      26   0.2213 0.01381       0.1958       0.2501
##  13.001    121      12   0.1994 0.01381       0.1740       0.2284
##  14.001     97      14   0.1706 0.01380       0.1456       0.1999
##  15.001     77       5   0.1595 0.01376       0.1347       0.1889
##  16.001     67       5   0.1476 0.01372       0.1230       0.1771
##  17.001     58       5   0.1349 0.01367       0.1106       0.1645
##  18.001     49       4   0.1239 0.01362       0.0999       0.1537
##  19.001     45       1   0.1211 0.01359       0.0972       0.1509
##  20.001     42       4   0.1096 0.01346       0.0861       0.1394
##  21.001     33       4   0.0963 0.01337       0.0734       0.1264
##  23.001     25       3   0.0847 0.01333       0.0623       0.1153
##  24.001     20       3   0.0720 0.01319       0.0503       0.1031
##  26.001     15       1   0.0672 0.01316       0.0458       0.0987
##  27.001     12       1   0.0616 0.01320       0.0405       0.0938
##  29.001      9       2   0.0479 0.01336       0.0278       0.0828
##  34.001      5       1   0.0383 0.01370       0.0190       0.0772
## 
##                 Q6=Social Sciences 
##    time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.001   1623     461   0.7160  0.0112       0.6944       0.7382
##   1.001   1147     150   0.6223  0.0121       0.5991       0.6464
##   2.001    971      82   0.5698  0.0124       0.5461       0.5945
##   3.001    848      74   0.5201  0.0126       0.4960       0.5453
##   4.001    727      58   0.4786  0.0127       0.4543       0.5041
##   5.001    610      51   0.4386  0.0128       0.4142       0.4644
##   6.001    500      46   0.3982  0.0129       0.3736       0.4244
##   7.001    407      38   0.3610  0.0131       0.3363       0.3875
##   8.001    323      28   0.3297  0.0132       0.3049       0.3566
##   9.001    251      26   0.2956  0.0134       0.2704       0.3231
##  10.001    208      22   0.2643  0.0136       0.2390       0.2923
##  11.001    162      15   0.2398  0.0137       0.2144       0.2682
##  12.001    126      12   0.2170  0.0139       0.1914       0.2460
##  13.001    107       6   0.2048  0.0140       0.1792       0.2341
##  14.001     94       9   0.1852  0.0141       0.1596       0.2150
##  15.001     77       9   0.1636  0.0142       0.1380       0.1938
##  16.001     65       3   0.1560  0.0142       0.1306       0.1864
##  17.001     54       3   0.1474  0.0142       0.1219       0.1781
##  18.001     46       2   0.1409  0.0143       0.1155       0.1720
##  19.001     40       2   0.1339  0.0144       0.1084       0.1654
##  20.001     34       5   0.1142  0.0148       0.0886       0.1471
##  21.001     24       4   0.0952  0.0151       0.0698       0.1298
##  22.001     18       1   0.0899  0.0151       0.0646       0.1250
##  23.001     16       1   0.0843  0.0152       0.0592       0.1200
##  24.001     15       3   0.0674  0.0149       0.0437       0.1041
##  31.001      6       1   0.0562  0.0161       0.0320       0.0986

Para ver el tiempo mediano hasta obtener la plaza se puede ver

mod2
## Call: npsurv(formula = S1 ~ Q6, data = datos)
## 
##    42 observations deleted due to missingness 
##                                  n events median 0.95LCL 0.95UCL
## Q6=Agricultural Sciences       111     83      6       5       8
## Q6=Engineering and Technology 1317    939      4       4       5
## Q6=Humanities                  792    554      5       4       6
## Q6=Medical Sciences            710    476      6       6       7
## Q6=Natural Sciences           1446   1031      6       5       6
## Q6=Social Sciences            1623   1112      4       3       5

Si parece que según la disciplina las curvas de supervivencia son distintas, podríamos pensar en ir ajustando modelos de regresión paramétricos

Modelos paramétricos con survreg

Con survreg podemos construir modelos paramétricos (la otra opción son los semiparamétricos como los de regresión de cox)

Se pueden considerar varias distribuciones para la distribución del riesgo según se incrementa el tiempo (age-specific hazard models). Sólo pongo la exponencial y la weibull

Distribución Riesgo (hazard)
Exponencial \(constante = \dfrac{1}{\mu} = \lambda\)
Weibull \(\alpha\lambda\left(\lambda t\right)^{\alpha-1}\)

La distribución Weibull es la más flexible ya que puede tratar co hazards que se incrementan con el tiempo de forma aceleraca \((\alpha >1)\) o decelerada \((\alpha<1)\)

Veamos en primer lugar el modelo con var indep las disciplinas y dist exponencial. La categoría de referencia de Q6 es Agricultural Sciences

mod3 <- survreg(S1 ~ Q6, data=datos, dist ="exponential")
summary(mod3)
## 
## Call:
## survreg(formula = S1 ~ Q6, data = datos, dist = "exponential")
##                                Value Std. Error      z        p
## (Intercept)                   2.1480      0.110 19.569 2.84e-85
## Q6Engineering and Technology -0.2345      0.115 -2.048 4.06e-02
## Q6Humanities                 -0.1384      0.118 -1.176 2.39e-01
## Q6Medical Sciences            0.0557      0.119  0.468 6.40e-01
## Q6Natural Sciences           -0.0301      0.114 -0.264 7.92e-01
## Q6Social Sciences            -0.3142      0.114 -2.761 5.76e-03
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -12555.1   Loglik(intercept only)= -12592
##  Chisq= 73.93 on 5 degrees of freedom, p= 1.6e-14 
## Number of Newton-Raphson Iterations: 7 
## n=5999 (42 observations deleted due to missingness)

La tabla con los coeficientes y p-valores, redondeando a 3 decimales

round(summary(mod3)$table,3)
##                               Value Std. Error      z     p
## (Intercept)                   2.148      0.110 19.569 0.000
## Q6Engineering and Technology -0.234      0.115 -2.048 0.041
## Q6Humanities                 -0.138      0.118 -1.176 0.239
## Q6Medical Sciences            0.056      0.119  0.468 0.640
## Q6Natural Sciences           -0.030      0.114 -0.264 0.792
## Q6Social Sciences            -0.314      0.114 -2.761 0.006

Bajo la hipótesis de errores exponenciales, vemos que la supervivencia es significativamente distinta de Agricultural Sciences para la categoría de Social Science.

Antes de considerar si agrupamos categorías, veamos si es mejor considerar una distribución Weibull

# la Weibull es la distribución por defecto en survreg y no hace falta especificarlo
mod4 <- survreg(S1 ~ Q6, data=datos)
summary(mod4)
## 
## Call:
## survreg(formula = S1 ~ Q6, data = datos)
##                               Value Std. Error      z        p
## (Intercept)                   2.119     0.3016  7.026 2.12e-12
## Q6Engineering and Technology -0.236     0.3146 -0.750 4.53e-01
## Q6Humanities                 -0.192     0.3234 -0.592 5.54e-01
## Q6Medical Sciences            0.292     0.3269  0.894 3.71e-01
## Q6Natural Sciences            0.206     0.3135  0.657 5.11e-01
## Q6Social Sciences            -0.388     0.3127 -1.241 2.15e-01
## Log(scale)                    1.011     0.0141 71.886 0.00e+00
## 
## Scale= 2.75 
## 
## Weibull distribution
## Loglik(model)= -8725.6   Loglik(intercept only)= -8744.6
##  Chisq= 38.06 on 5 degrees of freedom, p= 3.7e-07 
## Number of Newton-Raphson Iterations: 7 
## n=5999 (42 observations deleted due to missingness)
round(summary(mod4)$table,4)
##                                Value Std. Error       z      p
## (Intercept)                   2.1192     0.3016  7.0261 0.0000
## Q6Engineering and Technology -0.2359     0.3146 -0.7496 0.4535
## Q6Humanities                 -0.1916     0.3234 -0.5923 0.5536
## Q6Medical Sciences            0.2922     0.3269  0.8938 0.3714
## Q6Natural Sciences            0.2060     0.3135  0.6572 0.5111
## Q6Social Sciences            -0.3880     0.3127 -1.2409 0.2146
## Log(scale)                    1.0108     0.0141 71.8863 0.0000

El parámetro de scale=1.11 indica que la pendiente de la función hazard se incrementa con el tiempo. Al igual que antes, las categorías significativamente distintas son Engineering and Technology Humanities y Social Science.

Comparo los dos modelos usando la función anova

anova(mod3, mod4)
##   Terms Resid. Df    -2*LL Test Df Deviance Pr(>Chi)
## 1    Q6      5993 25110.11      NA       NA       NA
## 2    Q6      5992 17451.12    =  1 7658.998        0

y es mejor el modelo con la distribución Weibull

Podemos probar a unir todos los niveles salvo Social Sciences

datos$Q6_rec <- datos$Q6
levels(datos$Q6_rec)
## [1] "Agricultural Sciences"      "Engineering and Technology"
## [3] "Humanities"                 "Medical Sciences"          
## [5] "Natural Sciences"           "Social Sciences"
levels(datos$Q6_rec)[1:5] <- "Grupo1"
levels(datos$Q6_rec)
## [1] "Grupo1"          "Social Sciences"
mod5 <- survreg(S1 ~ Q6_rec, data=datos)
anova(mod4,mod5)
##    Terms Resid. Df    -2*LL Test Df  Deviance     Pr(>Chi)
## 1     Q6      5992 17451.12      NA        NA           NA
## 2 Q6_rec      5996 17471.87    = -4 -20.75325 0.0003544068

Al ser el p_valor menor de 0.5 significa que ambos modelos son distintos y no está justificado unir esas categorías.

Probemos en unir Agricultural sciences con Natural Sciences y Medical Scienes. Machacamos mod5

datos$Q6_rec <- datos$Q6
levels(datos$Q6_rec)
## [1] "Agricultural Sciences"      "Engineering and Technology"
## [3] "Humanities"                 "Medical Sciences"          
## [5] "Natural Sciences"           "Social Sciences"
levels(datos$Q6_rec)[c(1,4,5)] <- "Grupo1"
levels(datos$Q6_rec)
## [1] "Grupo1"                     "Engineering and Technology"
## [3] "Humanities"                 "Social Sciences"
mod5 <- survreg(S1 ~ Q6_rec, data=datos)
anova(mod4,mod5)
##    Terms Resid. Df    -2*LL Test Df   Deviance  Pr(>Chi)
## 1     Q6      5992 17451.12      NA         NA        NA
## 2 Q6_rec      5994 17451.99    = -2 -0.8752751 0.6455597

Estos dos modelos si son equivalentes, podemos unir esos niveles

summary(mod5)
## 
## Call:
## survreg(formula = S1 ~ Q6_rec, data = datos)
##                                   Value Std. Error     z         p
## (Intercept)                       2.341     0.0693 33.78 3.95e-250
## Q6_recEngineering and Technology -0.458     0.1132 -4.04  5.24e-05
## Q6_recHumanities                 -0.413     0.1356 -3.05  2.31e-03
## Q6_recSocial Sciences            -0.610     0.1075 -5.67  1.42e-08
## Log(scale)                        1.011     0.0141 71.89  0.00e+00
## 
## Scale= 2.75 
## 
## Weibull distribution
## Loglik(model)= -8726   Loglik(intercept only)= -8744.6
##  Chisq= 37.18 on 3 degrees of freedom, p= 4.2e-08 
## Number of Newton-Raphson Iterations: 7 
## n=5999 (42 observations deleted due to missingness)

Tomamos como referencia “Social Sciences”, para ver si hay diferencias

datos$Q6_rec2 <- relevel(datos$Q6_rec,ref="Social Sciences")

mod6 <- survreg(S1 ~ Q6_rec2, data=datos)

round(summary(mod6)$table,3)
##                                   Value Std. Error      z     p
## (Intercept)                       1.731      0.082 21.001 0.000
## Q6_rec2Grupo1                     0.610      0.108  5.671 0.000
## Q6_rec2Engineering and Technology 0.152      0.122  1.249 0.212
## Q6_rec2Humanities                 0.196      0.143  1.375 0.169
## Log(scale)                        1.011      0.014 71.888 0.000

Parece que entre Social sciences y Engineering and Technology no hay diferencias. Unamos a ver si se puede simplificar el modelo.

datos$Q6_rec3 <- datos$Q6_rec2
levels(datos$Q6_rec3)
## [1] "Social Sciences"            "Grupo1"                    
## [3] "Engineering and Technology" "Humanities"
levels(datos$Q6_rec3)[c(1,3)] <- "Grupo2"
levels(datos$Q6_rec3)
## [1] "Grupo2"     "Grupo1"     "Humanities"
mod7 <- survreg(S1 ~ Q6_rec3, data = datos)
anova(mod6, mod7) 
##     Terms Resid. Df    -2*LL Test Df  Deviance  Pr(>Chi)
## 1 Q6_rec2      5994 17451.99      NA        NA        NA
## 2 Q6_rec3      5995 17453.55    = -1 -1.562549 0.2112924
round(summary(mod7)$table,3)
##                   Value Std. Error      z     p
## (Intercept)       1.802      0.061 29.662 0.000
## Q6_rec3Grupo1     0.539      0.092  5.864 0.000
## Q6_rec3Humanities 0.126      0.132  0.955 0.339
## Log(scale)        1.011      0.014 71.892 0.000

Grupo 1: Agricultural Science, Medical Science, Natural Science Grupo 2: Engineering and Technology, Social Scienc Grupo 3: Humanities

with(datos, table(Q6, Q6_rec3))
##                             Q6_rec3
## Q6                           Grupo2 Grupo1 Humanities
##   Agricultural Sciences           0    111          0
##   Engineering and Technology   1317      0          0
##   Humanities                      0      0        792
##   Medical Sciences                0    710          0
##   Natural Sciences                0   1446          0
##   Social Sciences              1623      0          0

La curva de supervivencia empírica es

survplot(npsurv(S1 ~ Q6_rec3, datos), lwd=2, col=colores[c(1,4,5)])

survfit(S1 ~ Q6_rec3, datos)
## Call: survfit(formula = S1 ~ Q6_rec3, data = datos)
## 
##    42 observations deleted due to missingness 
##                       n events median 0.95LCL 0.95UCL
## Q6_rec3=Grupo2     2940   2051      4       4       5
## Q6_rec3=Grupo1     2267   1590      6       6       6
## Q6_rec3=Humanities  792    554      5       4       6

Interacción entre mobility y Discipline

mod8 <- survfit(S1 ~ Q6 + mobilityPG24, data = datos)
plot(mod8, col=colores, lty=c(rep(1,2),rep(2,2),rep(3,2),rep(4,2), rep(5,2),
                              rep(6,2)), lwd=c(rep(2,6),rep(3,6)))

Veamos las medianas calculadas por survfit y sus intervalos de confianza

mod8
## Call: survfit(formula = S1 ~ Q6 + mobilityPG24, data = datos)
## 
##    156 observations deleted due to missingness 
##                                                  n events median 0.95LCL
## Q6=Agricultural Sciences, mobilityPG24=0        68     54      5       3
## Q6=Agricultural Sciences, mobilityPG24=1        39     29      6       5
## Q6=Engineering and Technology, mobilityPG24=0  876    646      4       4
## Q6=Engineering and Technology, mobilityPG24=1  414    293      5       4
## Q6=Humanities, mobilityPG24=0                  504    372      4       3
## Q6=Humanities, mobilityPG24=1                  264    182      6       5
## Q6=Medical Sciences, mobilityPG24=0            437    302      6       5
## Q6=Medical Sciences, mobilityPG24=1            259    174      7       6
## Q6=Natural Sciences, mobilityPG24=0            548    416      5       4
## Q6=Natural Sciences, mobilityPG24=1            885    615      6       6
## Q6=Social Sciences, mobilityPG24=0            1115    798      4       3
## Q6=Social Sciences, mobilityPG24=1             476    314      4       3
##                                               0.95UCL
## Q6=Agricultural Sciences, mobilityPG24=0            8
## Q6=Agricultural Sciences, mobilityPG24=1           10
## Q6=Engineering and Technology, mobilityPG24=0       5
## Q6=Engineering and Technology, mobilityPG24=1       6
## Q6=Humanities, mobilityPG24=0                       5
## Q6=Humanities, mobilityPG24=1                       7
## Q6=Medical Sciences, mobilityPG24=0                 7
## Q6=Medical Sciences, mobilityPG24=1                 8
## Q6=Natural Sciences, mobilityPG24=0                 6
## Q6=Natural Sciences, mobilityPG24=1                 7
## Q6=Social Sciences, mobilityPG24=0                  4
## Q6=Social Sciences, mobilityPG24=1                  6

Aquí hay algo interesante, el tiempo mediano en obtener la plaza es un año mayor para los que se movieron que para los que no, en todas las disciplinas, excepto para los de Social Sciences, cuya mediana es la misma. Si vemos las curvas para los de Social Sciences, vemos que la supervivencia es sistemáticamente mayor para los que se mueven qeu para los que no, es decir, si te mueves obtienes más tarde la plaza, aunque en Social Sciences esto es menos acusado.

Utilizando survreg

Nota: Añado Countryb, considero la triple interacción y uso step para ver que interacción es significativa

mod9 <- survreg(S1 ~ Q6 * mobilityPG24 * Countryb  , datos)
mod9.step <- step(mod9)
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Start:  AIC=17147.44
## S1 ~ Q6 * mobilityPG24 * Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
##                            Df   AIC
## - Q6:mobilityPG24:Countryb 45 17101
## <none>                        17147
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
## 
## Step:  AIC=17101.38
## S1 ~ Q6 + mobilityPG24 + Countryb + Q6:mobilityPG24 + Q6:Countryb + 
##     mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
##                         Df   AIC
## - Q6:Countryb           45 17060
## - Q6:mobilityPG24        5 17093
## <none>                     17101
## - mobilityPG24:Countryb  9 17118
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
## 
## Step:  AIC=17059.88
## S1 ~ Q6 + mobilityPG24 + Countryb + Q6:mobilityPG24 + mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
##                         Df   AIC
## - Q6:mobilityPG24        5 17051
## <none>                     17060
## - mobilityPG24:Countryb  9 17076
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
## 
## Step:  AIC=17051.37
## S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
##                         Df   AIC
## <none>                     17051
## - mobilityPG24:Countryb  9 17068
## - Q6                     5 17265

El procedimiento por pasos nos dice que la interacción significativa es la de movilidad y país. Esto va en línea con lo que pensábamos, la movilidad influye sobre los tiempos hasta conseguir la plaza de manera diferente en cada país.. 7

summary(mod9.step)
## 
## Call:
## survreg(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
##     data = datos)
##                                    Value Std. Error      z        p
## (Intercept)                       1.2926      0.323  4.000 6.34e-05
## Q6Engineering and Technology     -0.1010      0.312 -0.324 7.46e-01
## Q6Humanities                     -0.1546      0.321 -0.481 6.30e-01
## Q6Medical Sciences                0.4314      0.325  1.328 1.84e-01
## Q6Natural Sciences                0.2309      0.312  0.741 4.59e-01
## Q6Social Sciences                -0.2019      0.311 -0.648 5.17e-01
## mobilityPG24                      0.7523      0.182  4.144 3.41e-05
## Countrybbelgium                   0.9766      0.373  2.617 8.88e-03
## Countrybfrance                    0.0681      0.193  0.353 7.24e-01
## Countrybgermany                   1.3793      0.218  6.332 2.43e-10
## Countrybitaly                     0.5620      0.165  3.416 6.35e-04
## Countrybnetherlands               0.1199      0.224  0.534 5.93e-01
## Countrybpoland                    0.4001      0.352  1.136 2.56e-01
## Countrybspain                    -0.4016      0.197 -2.039 4.15e-02
## Countrybsweden                    0.8114      0.195  4.167 3.09e-05
## Countrybswitzerland               1.0635      0.471  2.258 2.40e-02
## mobilityPG24:Countrybbelgium      0.3110      0.559  0.556 5.78e-01
## mobilityPG24:Countrybfrance      -0.0699      0.334 -0.209 8.34e-01
## mobilityPG24:Countrybgermany     -0.7901      0.337 -2.346 1.90e-02
## mobilityPG24:Countrybitaly       -1.0698      0.282 -3.793 1.49e-04
## mobilityPG24:Countrybnetherlands  0.0623      0.344  0.181 8.56e-01
## mobilityPG24:Countrybpoland      -0.7943      0.604 -1.316 1.88e-01
## mobilityPG24:Countrybspain        0.6661      0.328  2.029 4.25e-02
## mobilityPG24:Countrybsweden      -0.2841      0.305 -0.931 3.52e-01
## mobilityPG24:Countrybswitzerland -0.5108      0.536 -0.953 3.41e-01
## Log(scale)                        0.9924      0.014 70.859 0.00e+00
## 
## Scale= 2.7 
## 
## Weibull distribution
## Loglik(model)= -8499.7   Loglik(intercept only)= -8605.2
##  Chisq= 210.96 on 24 degrees of freedom, p= 0 
## Number of Newton-Raphson Iterations: 7 
## n=5885 (156 observations deleted due to missingness)
round(summary(mod9.step)$table, 3)
##                                   Value Std. Error      z     p
## (Intercept)                       1.293      0.323  4.000 0.000
## Q6Engineering and Technology     -0.101      0.312 -0.324 0.746
## Q6Humanities                     -0.155      0.321 -0.481 0.630
## Q6Medical Sciences                0.431      0.325  1.328 0.184
## Q6Natural Sciences                0.231      0.312  0.741 0.459
## Q6Social Sciences                -0.202      0.311 -0.648 0.517
## mobilityPG24                      0.752      0.182  4.144 0.000
## Countrybbelgium                   0.977      0.373  2.617 0.009
## Countrybfrance                    0.068      0.193  0.353 0.724
## Countrybgermany                   1.379      0.218  6.332 0.000
## Countrybitaly                     0.562      0.165  3.416 0.001
## Countrybnetherlands               0.120      0.224  0.534 0.593
## Countrybpoland                    0.400      0.352  1.136 0.256
## Countrybspain                    -0.402      0.197 -2.039 0.041
## Countrybsweden                    0.811      0.195  4.167 0.000
## Countrybswitzerland               1.064      0.471  2.258 0.024
## mobilityPG24:Countrybbelgium      0.311      0.559  0.556 0.578
## mobilityPG24:Countrybfrance      -0.070      0.334 -0.209 0.834
## mobilityPG24:Countrybgermany     -0.790      0.337 -2.346 0.019
## mobilityPG24:Countrybitaly       -1.070      0.282 -3.793 0.000
## mobilityPG24:Countrybnetherlands  0.062      0.344  0.181 0.856
## mobilityPG24:Countrybpoland      -0.794      0.604 -1.316 0.188
## mobilityPG24:Countrybspain        0.666      0.328  2.029 0.042
## mobilityPG24:Countrybsweden      -0.284      0.305 -0.931 0.352
## mobilityPG24:Countrybswitzerland -0.511      0.536 -0.953 0.341
## Log(scale)                        0.992      0.014 70.859 0.000

Las categorías de referencia en cada variable son Social Sciences para Q6, 0 para mobilityPG24 y United Kingdom para Countryb.. Lo podemos cambiar según que nos interese tener en cada categoría

Interpretaciones:

Un coeficiente positivo significa más supervivencia que la categoría de referencia.

S(t, x= United kingdom, movilidad 1 y Ciencias sociales) = exp{-exp(-1.293)^2.7 t ^2.7}

Ejemplo con España.

El coeficiente para Spain es -0.4 lo que implica que hay menos superviencia en España que en el Uk ( es decir, en España se consigue la plaza antes). Ahora bien, el coeficiente en España para los que se mueven es 0.66, lo que indica que hay mayor supervivencia para los que se mueven que para los que no.. Así, por un lado, se tiene que en España se obtiene la plaza antes, pero que moverse en España tiene un efecto negativo, (el que se fue a sevilla), comparado con el resto de países. Más aún, de los coeficiente positivos de las interacciones entre país y movilidad (en Bélgica, Francia y España), es el único con p_valor < 0.05.

Opinión personal mía . Con esta definición de permanent y movilidad España no sale bien parada, en general se tarda menos en obtener plaza. pero la movilidad influye muy negativamente.

# modelo 9 por kaplan meier

res.km <- 
## Define a function to plot survreg prediction by gender

survreg.curves <- function(model, col = "black",
                           toPredict,
                           seq.quantiles = seq(from = 0.00, to = 0.97, by = 0.01),...) {

    x = predict(model, newdata=toPredict,
                type="quantile",
                p = seq.quantiles)
    rownames(x) <- paste0("mobility = ",toPredict$mobilityPG24, " ,Discipline = ", toPredict$Q6, ", Country = ", toPredict$Countryb )  
    y = 1-seq.quantiles
    plot(x[1,],y,type="n", xlab="time", ylab="Survival", axes=F,xlim=c(0,20),...)
    axis(1)
    axis(2, at=c(0,0.25, 0.5, 0.75, 1))
    
    plotlines <- function(t){
        lines(x=t, y=y,...)
        
    }
    apply(x,1, plotlines)

}
# seleccionamos de los datos, los que sean Social Sciences, movilidad 1 y 
# que pertenezcan a Uk y a Spain

dat1 <-  data.frame(Q6 = "Social Sciences", mobilityPG24 = c(0,1),
                         Countryb = "spain" )

dat1
##                Q6 mobilityPG24 Countryb
## 1 Social Sciences            0    spain
## 2 Social Sciences            1    spain
survreg.curves(mod9.step, toPredict = dat1, main="mobility Yes vs No\n in Social Sciences in Spain")

## NULL
predict(mod9.step, newdata=dat1, type="response")
##        1        2 
## 1.991886 8.227654
# 
dat2 <- data.frame(Q6 = "Social Sciences", mobilityPG24 = c(0,1),
                         Countryb = "united kingdom" )
dat2 
##                Q6 mobilityPG24       Countryb
## 1 Social Sciences            0 united kingdom
## 2 Social Sciences            1 united kingdom
survreg.curves(mod9.step, toPredict = dat2, main="mobility Yes vs No\n in Social Sciences in United kingdom")

## NULL
predict(mod9.step, newdata=dat2, type="response")
##        1        2 
## 2.976423 6.315820
dat3 <- data.frame(Q6 = c("Social Sciences"),
                   mobilityPG24 = c(0,1),
                         Countryb = "italy" )

dat3
##                Q6 mobilityPG24 Countryb
## 1 Social Sciences            0    italy
## 2 Social Sciences            1    italy
predict(mod9.step, newdata=dat3, type="response")
##        1        2 
## 5.221099 3.801065
survreg.curves(mod9.step, toPredict=dat3)

## NULL

Otra forma de pintarlo. Con la definición matemática

Comparamos movilidad frente no movilidad para los de agricultural sciences en reino unido. Como no hay interacción entre disciplina y movilididad, la separación entre curvas será la misma para las otras disciplinas en el Reino Unido.

# Curva supervivencia estimada para la cat de referencia (Uk, agricultural sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1]) * x)^(1/mod9.step$scale)),from=0, to=20,
        col="red", main="Survival curve for Uk, \nin agricultural sciences discipline", xlab="Time", ylab = "Survival estimate")

# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1]-mod9.step$coef[7]) * x)^(1/mod9.step$scale)),
      from=0, to = 20,
        col="blue",add=TRUE)

legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)

# Curva supervivencia estimada para la cat Spain,  Social sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]) * x)^(1/mod9.step$scale)),from=0, to=20,
        col="red", main="Survival curve for Spain, \nin Social sciences discipline", xlab="Time", ylab = "Survival estimate")

# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]-mod9.step$coef[7]-mod9.step$coef[23]) * x)^(1/mod9.step$scale)),
      from=0, to = 20,
        col="blue",add=TRUE)

legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)

En Italia, las curvas de supervivencia son más parecidas, pero se invierte la relación, los que no se mueven tienen mayor supervivencia, es decir, obtienen la plaza después de los que se mueven.

Si nos fijamos en los coeficientes el de movilidad es de 0.75 y el de la interacción entre Italia y movilidad es de -1.07, con lo que en Italia el coef de movilidad es negativo e implica menos supervivencia (mayor probabilidad de tener la plaza antes) para los que se mueven. Esto también pasa en Alemania y Polonia , ambas con -0.79, pero en estos casos hacen que las curvas sean casi iguales para los que se mueven que para los que no. En el otro extremo están países como España o Bélgica.

# Curva supervivencia estimada para la cat Italia,  Social sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]) * x)^(1/mod9.step$scale)),from=0, to=20,
        col="red", main="Survival curve for Italy, \nin Social sciences discipline", xlab="Time", ylab = "Survival estimate")

# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]-mod9.step$coef[7]-mod9.step$coef[20]) * x)^(1/mod9.step$scale)),
      from=0, to = 20,
        col="blue",add=TRUE)

legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)

subdatos <- datos[ datos$Countryb=="italy" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))

plot(survfit(S2 ~ mobilityPG24, subdatos))

curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]) * x)^(1/mod9.step$scale)),from=0, to=20,
        col="red", add=TRUE)

# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]-mod9.step$coef[7]-mod9.step$coef[20]) * x)^(1/mod9.step$scale)),
      from=0, to = 20,
        col="blue",add=TRUE)

legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)

En españa

subdatos <- datos[ datos$Countryb=="spain" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))

plot(survfit(S2 ~ mobilityPG24, subdatos))

curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]) * x)^(1/mod9.step$scale)),from=0, to=20,
        col="red", add=TRUE)

# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]-mod9.step$coef[7]-mod9.step$coef[23]) * x)^(1/mod9.step$scale)),
      from=0, to = 20,
        col="blue",add=TRUE)

legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)

uk

subdatos <- datos[ datos$Countryb=="united kingdom" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))

plot(survfit(S2 ~ mobilityPG24, subdatos))

Enfoque de cox proportional hazard models

Partimos del último modelo

mod.cph <- coxph(S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
    data = datos)

summary(mod.cph)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
##     data = datos)
## 
##   n= 5885, number of events= 4195 
##    (156 observations deleted due to missingness)
## 
##                                      coef exp(coef) se(coef)      z
## Q6Engineering and Technology      0.05907   1.06085  0.11564  0.511
## Q6Humanities                      0.03976   1.04057  0.11925  0.333
## Q6Medical Sciences               -0.19196   0.82534  0.12045 -1.594
## Q6Natural Sciences               -0.07344   0.92919  0.11552 -0.636
## Q6Social Sciences                 0.08955   1.09368  0.11564  0.774
## mobilityPG24                     -0.25287   0.77657  0.06716 -3.765
## Countrybbelgium                  -0.36824   0.69195  0.13828 -2.663
## Countrybfrance                    0.10604   1.11187  0.07154  1.482
## Countrybgermany                  -0.54527   0.57968  0.08068 -6.758
## Countrybitaly                    -0.20681   0.81318  0.06096 -3.392
## Countrybnetherlands              -0.09036   0.91360  0.08330 -1.085
## Countrybpoland                   -0.22346   0.79974  0.13063 -1.711
## Countrybspain                     0.17865   1.19561  0.07318  2.441
## Countrybsweden                   -0.29893   0.74161  0.07217 -4.142
## Countrybswitzerland              -0.41366   0.66122  0.17463 -2.369
## mobilityPG24:Countrybbelgium     -0.13287   0.87557  0.20732 -0.641
## mobilityPG24:Countrybfrance      -0.09170   0.91238  0.12382 -0.741
## mobilityPG24:Countrybgermany      0.23511   1.26505  0.12484  1.883
## mobilityPG24:Countrybitaly        0.33619   1.39960  0.10462  3.213
## mobilityPG24:Countrybnetherlands  0.01108   1.01114  0.12763  0.087
## mobilityPG24:Countrybpoland       0.25837   1.29481  0.22391  1.154
## mobilityPG24:Countrybspain       -0.23049   0.79415  0.12172 -1.894
## mobilityPG24:Countrybsweden       0.07204   1.07470  0.11326  0.636
## mobilityPG24:Countrybswitzerland  0.15504   1.16771  0.19872  0.780
##                                  Pr(>|z|)    
## Q6Engineering and Technology     0.609491    
## Q6Humanities                     0.738785    
## Q6Medical Sciences               0.111006    
## Q6Natural Sciences               0.524933    
## Q6Social Sciences                0.438692    
## mobilityPG24                     0.000166 ***
## Countrybbelgium                  0.007744 ** 
## Countrybfrance                   0.138276    
## Countrybgermany                  1.40e-11 ***
## Countrybitaly                    0.000693 ***
## Countrybnetherlands              0.278042    
## Countrybpoland                   0.087152 .  
## Countrybspain                    0.014640 *  
## Countrybsweden                   3.44e-05 ***
## Countrybswitzerland              0.017848 *  
## mobilityPG24:Countrybbelgium     0.521569    
## mobilityPG24:Countrybfrance      0.458928    
## mobilityPG24:Countrybgermany     0.059665 .  
## mobilityPG24:Countrybitaly       0.001312 ** 
## mobilityPG24:Countrybnetherlands 0.930847    
## mobilityPG24:Countrybpoland      0.248545    
## mobilityPG24:Countrybspain       0.058283 .  
## mobilityPG24:Countrybsweden      0.524725    
## mobilityPG24:Countrybswitzerland 0.435283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                                  exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology        1.0608     0.9426    0.8457    1.3307
## Q6Humanities                        1.0406     0.9610    0.8237    1.3145
## Q6Medical Sciences                  0.8253     1.2116    0.6518    1.0451
## Q6Natural Sciences                  0.9292     1.0762    0.7409    1.1653
## Q6Social Sciences                   1.0937     0.9143    0.8719    1.3719
## mobilityPG24                        0.7766     1.2877    0.6808    0.8858
## Countrybbelgium                     0.6920     1.4452    0.5277    0.9074
## Countrybfrance                      1.1119     0.8994    0.9664    1.2792
## Countrybgermany                     0.5797     1.7251    0.4949    0.6790
## Countrybitaly                       0.8132     1.2297    0.7216    0.9164
## Countrybnetherlands                 0.9136     1.0946    0.7760    1.0756
## Countrybpoland                      0.7997     1.2504    0.6191    1.0331
## Countrybspain                       1.1956     0.8364    1.0358    1.3800
## Countrybsweden                      0.7416     1.3484    0.6438    0.8543
## Countrybswitzerland                 0.6612     1.5123    0.4696    0.9311
## mobilityPG24:Countrybbelgium        0.8756     1.1421    0.5832    1.3145
## mobilityPG24:Countrybfrance         0.9124     1.0960    0.7158    1.1630
## mobilityPG24:Countrybgermany        1.2651     0.7905    0.9905    1.6158
## mobilityPG24:Countrybitaly          1.3996     0.7145    1.1401    1.7182
## mobilityPG24:Countrybnetherlands    1.0111     0.9890    0.7874    1.2985
## mobilityPG24:Countrybpoland         1.2948     0.7723    0.8349    2.0082
## mobilityPG24:Countrybspain          0.7941     1.2592    0.6256    1.0081
## mobilityPG24:Countrybsweden         1.0747     0.9305    0.8608    1.3418
## mobilityPG24:Countrybswitzerland    1.1677     0.8564    0.7910    1.7238
## 
## Concordance= 0.6  (se = 0.006 )
## Rsquare= 0.041   (max possible= 1 )
## Likelihood ratio test= 247.2  on 24 df,   p=0
## Wald test            = 246.9  on 24 df,   p=0
## Score (logrank) test = 252.1  on 24 df,   p=0

La categoría de referencia es Q6= Agricultural Sciences, mobility=0, Countryb= UK

Predicción del riesgo para los que se mueven o no en Italia cuya disciplina sea Social Sciencies, ese riesgo siempre es con respecto a la categoría de referencia

dat3
##                Q6 mobilityPG24 Countryb
## 1 Social Sciences            0    italy
## 2 Social Sciences            1    italy
predict(mod.cph, newdata=dat3, type="risk", se.fit=TRUE)
## $fit
##        1        2 
## 1.104668 1.200650 
## 
## $se.fit
##          1          2 
## 0.05164945 0.08035408

Predicción del riesgo para los que se mueven o no en España cuya disciplina sea Social Sciencies

dat1
##                Q6 mobilityPG24 Countryb
## 1 Social Sciences            0    spain
## 2 Social Sciences            1    spain
predict(mod.cph, newdata=dat1, type="risk", se.fit=TRUE)
## $fit
##        1        2 
## 1.624184 1.001650 
## 
## $se.fit
##          1          2 
## 0.08111035 0.08951564

Considerando estratos los países, esto lo que hace es que considera que hay una función de supervivencia base distinta en cada país, pero que la relación entre las covariables es la misma en cada país

mod.cph1 <- coxph(S1 ~ Q6 + mobilityPG24 + strata(Countryb), 
    data = datos, method="breslow")

summary(mod.cph1)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + strata(Countryb), data = datos, 
##     method = "breslow")
## 
##   n= 5885, number of events= 4195 
##    (156 observations deleted due to missingness)
## 
##                                  coef exp(coef) se(coef)      z Pr(>|z|)
## Q6Engineering and Technology  0.03306   1.03361  0.11562  0.286    0.775
## Q6Humanities                  0.01006   1.01011  0.11945  0.084    0.933
## Q6Medical Sciences           -0.18403   0.83191  0.12055 -1.527    0.127
## Q6Natural Sciences           -0.07875   0.92427  0.11550 -0.682    0.495
## Q6Social Sciences             0.06076   1.06264  0.11575  0.525    0.600
## mobilityPG24                 -0.16693   0.84626  0.03399 -4.911 9.08e-07
##                                 
## Q6Engineering and Technology    
## Q6Humanities                    
## Q6Medical Sciences              
## Q6Natural Sciences              
## Q6Social Sciences               
## mobilityPG24                 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                              exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology    1.0336     0.9675    0.8240    1.2965
## Q6Humanities                    1.0101     0.9900    0.7993    1.2766
## Q6Medical Sciences              0.8319     1.2021    0.6568    1.0536
## Q6Natural Sciences              0.9243     1.0819    0.7370    1.1591
## Q6Social Sciences               1.0626     0.9411    0.8470    1.3333
## mobilityPG24                    0.8463     1.1817    0.7917    0.9046
## 
## Concordance= 0.562  (se = 0.016 )
## Rsquare= 0.01   (max possible= 1 )
## Likelihood ratio test= 59.34  on 6 df,   p=6.115e-11
## Wald test            = 58.44  on 6 df,   p=9.337e-11
## Score (logrank) test = 58.62  on 6 df,   p=8.592e-11
predict(mod.cph1, newdata=dat1, type="risk", se.fit=TRUE)
## $fit
##         1         2 
## 1.1484146 0.9718541 
## 
## $se.fit
##          1          2 
## 0.03485187 0.04126747
npsurv(mod.cph1)
## Call: npsurv(formula = mod.cph1)
## 
##                   n events median 0.95LCL 0.95UCL
## united kingdom 1291    937      5       4       5
## belgium         199    105      8       6      10
## france          564    440      3       3       4
## germany         576    370      8       7       9
## italy           905    765      5       5       6
## netherlands     487    346      5       4       6
## poland          150     99      6       3      12
## spain           650    443      4       3       5
## sweden          767    500      6       6       7
## switzerland     296    190      7       6       8
survplot(npsurv(mod.cph1), levels.only=TRUE, conf="none", lty=1:10, n.risk=TRUE)

comprobación asunciones , comprobar también para modelo con país como estrato

ph_fit_ps <-cox.zph(mod.cph)
round(ph_fit_ps$table,3)
##                                     rho   chisq     p
## Q6Engineering and Technology     -0.027   3.119 0.077
## Q6Humanities                     -0.048   9.891 0.002
## Q6Medical Sciences               -0.008   0.246 0.620
## Q6Natural Sciences                0.003   0.027 0.870
## Q6Social Sciences                -0.050  10.731 0.001
## mobilityPG24                      0.050  10.336 0.001
## Countrybbelgium                   0.027   3.101 0.078
## Countrybfrance                    0.056  13.345 0.000
## Countrybgermany                   0.090  33.904 0.000
## Countrybitaly                     0.115  54.937 0.000
## Countrybnetherlands              -0.017   1.231 0.267
## Countrybpoland                   -0.044   8.065 0.005
## Countrybspain                    -0.026   2.875 0.090
## Countrybsweden                    0.040   6.679 0.010
## Countrybswitzerland               0.013   0.717 0.397
## mobilityPG24:Countrybbelgium     -0.018   1.408 0.235
## mobilityPG24:Countrybfrance      -0.010   0.396 0.529
## mobilityPG24:Countrybgermany     -0.037   5.670 0.017
## mobilityPG24:Countrybitaly       -0.078  25.655 0.000
## mobilityPG24:Countrybnetherlands  0.007   0.196 0.658
## mobilityPG24:Countrybpoland       0.003   0.040 0.841
## mobilityPG24:Countrybspain        0.019   1.437 0.231
## mobilityPG24:Countrybsweden      -0.028   3.189 0.074
## mobilityPG24:Countrybswitzerland -0.001   0.002 0.962
## GLOBAL                               NA 265.937 0.000
plot(ph_fit_ps)

abline(h=0, lty=3)

mod.cph.test  <-  coxph(S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
    data = datos)

ph_fit <- cox.zph(mod.cph.test, transform="rank", global=FALSE)
round(ph_fit$table,3)
##                                     rho  chisq     p
## Q6Engineering and Technology     -0.026  2.823 0.093
## Q6Humanities                     -0.047  9.256 0.002
## Q6Medical Sciences               -0.005  0.097 0.755
## Q6Natural Sciences                0.005  0.121 0.728
## Q6Social Sciences                -0.049 10.294 0.001
## mobilityPG24                      0.050 10.488 0.001
## Countrybbelgium                   0.027  3.079 0.079
## Countrybfrance                    0.052 11.593 0.001
## Countrybgermany                   0.091 34.565 0.000
## Countrybitaly                     0.116 55.819 0.000
## Countrybnetherlands              -0.018  1.416 0.234
## Countrybpoland                   -0.046  8.901 0.003
## Countrybspain                    -0.028  3.229 0.072
## Countrybsweden                    0.040  6.603 0.010
## Countrybswitzerland               0.015  0.958 0.328
## mobilityPG24:Countrybbelgium     -0.017  1.277 0.258
## mobilityPG24:Countrybfrance      -0.009  0.324 0.569
## mobilityPG24:Countrybgermany     -0.038  5.974 0.015
## mobilityPG24:Countrybitaly       -0.075 23.826 0.000
## mobilityPG24:Countrybnetherlands  0.008  0.247 0.619
## mobilityPG24:Countrybpoland       0.001  0.003 0.955
## mobilityPG24:Countrybspain        0.020  1.666 0.197
## mobilityPG24:Countrybsweden      -0.025  2.688 0.101
## mobilityPG24:Countrybswitzerland -0.003  0.027 0.870

Como no se cumple la hipótesis de PH, o bien se estratifica (que no nos interesa) o se introduce que alguna covariable dependa también del tiempo. o se parte la función de supervivencia en trozos (investigar)

datos.bis <- survSplit(datos, cut=c(2,5), end = "time", event = "status", start="start")
datos.bis$gt[datos.bis$start<=2] <- 1
datos.bis$gt[datos.bis$start>2 & datos.bis$start<=5 ] <- 2
datos.bis$gt[datos.bis$start>5 ] <- 3
datos.bis$gt <- factor(datos.bis$gt)
# datos.bis$gt[datos.bis$start>10] <- 4
# vemos que coincide si ajusto datos.bis con start coincide 
coxph(Surv(start,time,status)~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,data=datos.bis)
## Call:
## coxph(formula = Surv(start, time, status) ~ Q6 + mobilityPG24 + 
##     Countryb + mobilityPG24:Countryb, data = datos.bis)
## 
## 
##                                     coef exp(coef) se(coef)     z       p
## Q6Engineering and Technology      0.0591    1.0608   0.1156  0.51 0.60949
## Q6Humanities                      0.0398    1.0406   0.1192  0.33 0.73878
## Q6Medical Sciences               -0.1920    0.8253   0.1205 -1.59 0.11101
## Q6Natural Sciences               -0.0734    0.9292   0.1155 -0.64 0.52493
## Q6Social Sciences                 0.0896    1.0937   0.1156  0.77 0.43869
## mobilityPG24                     -0.2529    0.7766   0.0672 -3.77 0.00017
## Countrybbelgium                  -0.3682    0.6920   0.1383 -2.66 0.00774
## Countrybfrance                    0.1060    1.1119   0.0715  1.48 0.13828
## Countrybgermany                  -0.5453    0.5797   0.0807 -6.76 1.4e-11
## Countrybitaly                    -0.2068    0.8132   0.0610 -3.39 0.00069
## Countrybnetherlands              -0.0904    0.9136   0.0833 -1.08 0.27804
## Countrybpoland                   -0.2235    0.7997   0.1306 -1.71 0.08715
## Countrybspain                     0.1787    1.1956   0.0732  2.44 0.01464
## Countrybsweden                   -0.2989    0.7416   0.0722 -4.14 3.4e-05
## Countrybswitzerland              -0.4137    0.6612   0.1746 -2.37 0.01785
## mobilityPG24:Countrybbelgium     -0.1329    0.8756   0.2073 -0.64 0.52157
## mobilityPG24:Countrybfrance      -0.0917    0.9124   0.1238 -0.74 0.45893
## mobilityPG24:Countrybgermany      0.2351    1.2651   0.1248  1.88 0.05967
## mobilityPG24:Countrybitaly        0.3362    1.3996   0.1046  3.21 0.00131
## mobilityPG24:Countrybnetherlands  0.0111    1.0111   0.1276  0.09 0.93085
## mobilityPG24:Countrybpoland       0.2584    1.2948   0.2239  1.15 0.24854
## mobilityPG24:Countrybspain       -0.2305    0.7941   0.1217 -1.89 0.05828
## mobilityPG24:Countrybsweden       0.0720    1.0747   0.1133  0.64 0.52473
## mobilityPG24:Countrybswitzerland  0.1550    1.1677   0.1987  0.78 0.43528
## 
## Likelihood ratio test=247  on 24 df, p=0
## n= 12541, number of events= 4195 
##    (419 observations deleted due to missingness)
coxph(S1~Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,data=datos)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
##     data = datos)
## 
## 
##                                     coef exp(coef) se(coef)     z       p
## Q6Engineering and Technology      0.0591    1.0608   0.1156  0.51 0.60949
## Q6Humanities                      0.0398    1.0406   0.1192  0.33 0.73878
## Q6Medical Sciences               -0.1920    0.8253   0.1205 -1.59 0.11101
## Q6Natural Sciences               -0.0734    0.9292   0.1155 -0.64 0.52493
## Q6Social Sciences                 0.0896    1.0937   0.1156  0.77 0.43869
## mobilityPG24                     -0.2529    0.7766   0.0672 -3.77 0.00017
## Countrybbelgium                  -0.3682    0.6920   0.1383 -2.66 0.00774
## Countrybfrance                    0.1060    1.1119   0.0715  1.48 0.13828
## Countrybgermany                  -0.5453    0.5797   0.0807 -6.76 1.4e-11
## Countrybitaly                    -0.2068    0.8132   0.0610 -3.39 0.00069
## Countrybnetherlands              -0.0904    0.9136   0.0833 -1.08 0.27804
## Countrybpoland                   -0.2235    0.7997   0.1306 -1.71 0.08715
## Countrybspain                     0.1787    1.1956   0.0732  2.44 0.01464
## Countrybsweden                   -0.2989    0.7416   0.0722 -4.14 3.4e-05
## Countrybswitzerland              -0.4137    0.6612   0.1746 -2.37 0.01785
## mobilityPG24:Countrybbelgium     -0.1329    0.8756   0.2073 -0.64 0.52157
## mobilityPG24:Countrybfrance      -0.0917    0.9124   0.1238 -0.74 0.45893
## mobilityPG24:Countrybgermany      0.2351    1.2651   0.1248  1.88 0.05967
## mobilityPG24:Countrybitaly        0.3362    1.3996   0.1046  3.21 0.00131
## mobilityPG24:Countrybnetherlands  0.0111    1.0111   0.1276  0.09 0.93085
## mobilityPG24:Countrybpoland       0.2584    1.2948   0.2239  1.15 0.24854
## mobilityPG24:Countrybspain       -0.2305    0.7941   0.1217 -1.89 0.05828
## mobilityPG24:Countrybsweden       0.0720    1.0747   0.1133  0.64 0.52473
## mobilityPG24:Countrybswitzerland  0.1550    1.1677   0.1987  0.78 0.43528
## 
## Likelihood ratio test=247  on 24 df, p=0
## n= 5885, number of events= 4195 
##    (156 observations deleted due to missingness)
mod.gt <- coxph(Surv(start,time,status)~Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb + mobilityPG24:gt + Q6:gt+ Countryb:gt,data=datos.bis)

ph_fit <- cox.zph(mod.gt, transform="rank")
round(ph_fit$table,3)
##                                     rho   chisq     p
## Q6Engineering and Technology      0.006   0.155 0.694
## Q6Humanities                     -0.007   0.211 0.646
## Q6Medical Sciences                0.007   0.235 0.628
## Q6Natural Sciences                0.022   2.101 0.147
## Q6Social Sciences                -0.010   0.410 0.522
## mobilityPG24                      0.055  12.592 0.000
## Countrybbelgium                   0.004   0.074 0.786
## Countrybfrance                    0.066  18.608 0.000
## Countrybgermany                   0.041   6.810 0.009
## Countrybitaly                     0.086  30.004 0.000
## Countrybnetherlands              -0.006   0.180 0.671
## Countrybpoland                   -0.020   1.816 0.178
## Countrybspain                    -0.011   0.486 0.486
## Countrybsweden                    0.015   0.964 0.326
## Countrybswitzerland               0.002   0.015 0.902
## mobilityPG24:Countrybbelgium     -0.015   0.926 0.336
## mobilityPG24:Countrybfrance      -0.005   0.104 0.747
## mobilityPG24:Countrybgermany     -0.035   5.094 0.024
## mobilityPG24:Countrybitaly       -0.072  22.334 0.000
## mobilityPG24:Countrybnetherlands  0.006   0.139 0.709
## mobilityPG24:Countrybpoland      -0.002   0.012 0.913
## mobilityPG24:Countrybspain        0.020   1.624 0.203
## mobilityPG24:Countrybsweden      -0.024   2.394 0.122
## mobilityPG24:Countrybswitzerland -0.001   0.005 0.943
## mobilityPG24:gt2                 -0.025   2.535 0.111
## Q6Engineering and Technology:gt2 -0.010   0.412 0.521
## Q6Humanities:gt2                 -0.003   0.034 0.853
## Q6Medical Sciences:gt2           -0.010   0.435 0.510
## Q6Natural Sciences:gt2           -0.022   2.016 0.156
## Q6Social Sciences:gt2             0.001   0.008 0.927
## Countrybbelgium:gt2               0.001   0.005 0.946
## Countrybfrance:gt2               -0.029   3.345 0.067
## Countrybgermany:gt2              -0.010   0.393 0.530
## Countrybitaly:gt2                -0.036   5.410 0.020
## Countrybnetherlands:gt2           0.000   0.000 1.000
## Countrybpoland:gt2                0.022   2.068 0.150
## Countrybspain:gt2                -0.005   0.106 0.745
## Countrybsweden:gt2               -0.011   0.492 0.483
## Countrybswitzerland:gt2          -0.006   0.158 0.691
## GLOBAL                               NA 144.034 0.000

¿Estratificar por mobility? no tendría sentido, perderíamos como cuantificar mobility

mod.cph.test  <-  coxph(S1 ~ Q6 +mobilityPG24 + Countryb  + mobilityPG24:Countryb   , 
    data = datos)

ph_fit <- cox.zph(mod.cph.test, transform="rank", global=FALSE)
round(ph_fit$table,3)
##                                     rho  chisq     p
## Q6Engineering and Technology     -0.026  2.823 0.093
## Q6Humanities                     -0.047  9.256 0.002
## Q6Medical Sciences               -0.005  0.097 0.755
## Q6Natural Sciences                0.005  0.121 0.728
## Q6Social Sciences                -0.049 10.294 0.001
## mobilityPG24                      0.050 10.488 0.001
## Countrybbelgium                   0.027  3.079 0.079
## Countrybfrance                    0.052 11.593 0.001
## Countrybgermany                   0.091 34.565 0.000
## Countrybitaly                     0.116 55.819 0.000
## Countrybnetherlands              -0.018  1.416 0.234
## Countrybpoland                   -0.046  8.901 0.003
## Countrybspain                    -0.028  3.229 0.072
## Countrybsweden                    0.040  6.603 0.010
## Countrybswitzerland               0.015  0.958 0.328
## mobilityPG24:Countrybbelgium     -0.017  1.277 0.258
## mobilityPG24:Countrybfrance      -0.009  0.324 0.569
## mobilityPG24:Countrybgermany     -0.038  5.974 0.015
## mobilityPG24:Countrybitaly       -0.075 23.826 0.000
## mobilityPG24:Countrybnetherlands  0.008  0.247 0.619
## mobilityPG24:Countrybpoland       0.001  0.003 0.955
## mobilityPG24:Countrybspain        0.020  1.666 0.197
## mobilityPG24:Countrybsweden      -0.025  2.688 0.101
## mobilityPG24:Countrybswitzerland -0.003  0.027 0.870
dat1
##                Q6 mobilityPG24 Countryb
## 1 Social Sciences            0    spain
## 2 Social Sciences            1    spain
predict(mod.cph.test, dat1, type="risk")
##        1        2 
## 1.624184 1.001650
summary(mod.cph.test)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb, 
##     data = datos)
## 
##   n= 5885, number of events= 4195 
##    (156 observations deleted due to missingness)
## 
##                                      coef exp(coef) se(coef)      z
## Q6Engineering and Technology      0.05907   1.06085  0.11564  0.511
## Q6Humanities                      0.03976   1.04057  0.11925  0.333
## Q6Medical Sciences               -0.19196   0.82534  0.12045 -1.594
## Q6Natural Sciences               -0.07344   0.92919  0.11552 -0.636
## Q6Social Sciences                 0.08955   1.09368  0.11564  0.774
## mobilityPG24                     -0.25287   0.77657  0.06716 -3.765
## Countrybbelgium                  -0.36824   0.69195  0.13828 -2.663
## Countrybfrance                    0.10604   1.11187  0.07154  1.482
## Countrybgermany                  -0.54527   0.57968  0.08068 -6.758
## Countrybitaly                    -0.20681   0.81318  0.06096 -3.392
## Countrybnetherlands              -0.09036   0.91360  0.08330 -1.085
## Countrybpoland                   -0.22346   0.79974  0.13063 -1.711
## Countrybspain                     0.17865   1.19561  0.07318  2.441
## Countrybsweden                   -0.29893   0.74161  0.07217 -4.142
## Countrybswitzerland              -0.41366   0.66122  0.17463 -2.369
## mobilityPG24:Countrybbelgium     -0.13287   0.87557  0.20732 -0.641
## mobilityPG24:Countrybfrance      -0.09170   0.91238  0.12382 -0.741
## mobilityPG24:Countrybgermany      0.23511   1.26505  0.12484  1.883
## mobilityPG24:Countrybitaly        0.33619   1.39960  0.10462  3.213
## mobilityPG24:Countrybnetherlands  0.01108   1.01114  0.12763  0.087
## mobilityPG24:Countrybpoland       0.25837   1.29481  0.22391  1.154
## mobilityPG24:Countrybspain       -0.23049   0.79415  0.12172 -1.894
## mobilityPG24:Countrybsweden       0.07204   1.07470  0.11326  0.636
## mobilityPG24:Countrybswitzerland  0.15504   1.16771  0.19872  0.780
##                                  Pr(>|z|)    
## Q6Engineering and Technology     0.609491    
## Q6Humanities                     0.738785    
## Q6Medical Sciences               0.111006    
## Q6Natural Sciences               0.524933    
## Q6Social Sciences                0.438692    
## mobilityPG24                     0.000166 ***
## Countrybbelgium                  0.007744 ** 
## Countrybfrance                   0.138276    
## Countrybgermany                  1.40e-11 ***
## Countrybitaly                    0.000693 ***
## Countrybnetherlands              0.278042    
## Countrybpoland                   0.087152 .  
## Countrybspain                    0.014640 *  
## Countrybsweden                   3.44e-05 ***
## Countrybswitzerland              0.017848 *  
## mobilityPG24:Countrybbelgium     0.521569    
## mobilityPG24:Countrybfrance      0.458928    
## mobilityPG24:Countrybgermany     0.059665 .  
## mobilityPG24:Countrybitaly       0.001312 ** 
## mobilityPG24:Countrybnetherlands 0.930847    
## mobilityPG24:Countrybpoland      0.248545    
## mobilityPG24:Countrybspain       0.058283 .  
## mobilityPG24:Countrybsweden      0.524725    
## mobilityPG24:Countrybswitzerland 0.435283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                                  exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology        1.0608     0.9426    0.8457    1.3307
## Q6Humanities                        1.0406     0.9610    0.8237    1.3145
## Q6Medical Sciences                  0.8253     1.2116    0.6518    1.0451
## Q6Natural Sciences                  0.9292     1.0762    0.7409    1.1653
## Q6Social Sciences                   1.0937     0.9143    0.8719    1.3719
## mobilityPG24                        0.7766     1.2877    0.6808    0.8858
## Countrybbelgium                     0.6920     1.4452    0.5277    0.9074
## Countrybfrance                      1.1119     0.8994    0.9664    1.2792
## Countrybgermany                     0.5797     1.7251    0.4949    0.6790
## Countrybitaly                       0.8132     1.2297    0.7216    0.9164
## Countrybnetherlands                 0.9136     1.0946    0.7760    1.0756
## Countrybpoland                      0.7997     1.2504    0.6191    1.0331
## Countrybspain                       1.1956     0.8364    1.0358    1.3800
## Countrybsweden                      0.7416     1.3484    0.6438    0.8543
## Countrybswitzerland                 0.6612     1.5123    0.4696    0.9311
## mobilityPG24:Countrybbelgium        0.8756     1.1421    0.5832    1.3145
## mobilityPG24:Countrybfrance         0.9124     1.0960    0.7158    1.1630
## mobilityPG24:Countrybgermany        1.2651     0.7905    0.9905    1.6158
## mobilityPG24:Countrybitaly          1.3996     0.7145    1.1401    1.7182
## mobilityPG24:Countrybnetherlands    1.0111     0.9890    0.7874    1.2985
## mobilityPG24:Countrybpoland         1.2948     0.7723    0.8349    2.0082
## mobilityPG24:Countrybspain          0.7941     1.2592    0.6256    1.0081
## mobilityPG24:Countrybsweden         1.0747     0.9305    0.8608    1.3418
## mobilityPG24:Countrybswitzerland    1.1677     0.8564    0.7910    1.7238
## 
## Concordance= 0.6  (se = 0.006 )
## Rsquare= 0.041   (max possible= 1 )
## Likelihood ratio test= 247.2  on 24 df,   p=0
## Wald test            = 246.9  on 24 df,   p=0
## Score (logrank) test = 252.1  on 24 df,   p=0
anova(mod.cph.test)
## Analysis of Deviance Table
##  Cox model: response is S1
## Terms added sequentially (first to last)
## 
##                       loglik   Chisq Df Pr(>|Chi|)    
## NULL                  -33037                          
## Q6                    -33012  51.533  5  6.726e-10 ***
## mobilityPG24          -32990  43.268  1  4.774e-11 ***
## Countryb              -32928 124.650  9  < 2.2e-16 ***
## mobilityPG24:Countryb -32914  27.701  9    0.00107 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1